124 research outputs found

    Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli.

    Get PDF
    A significant obstacle in training predictive cell models is the lack of integrated data sources. We develop semi-supervised normalization pipelines and perform experimental characterization (growth, transcriptional, proteome) to create Ecomics, a consistent, quality-controlled multi-omics compendium for Escherichia coli with cohesive meta-data information. We then use this resource to train a multi-scale model that integrates four omics layers to predict genome-wide concentrations and growth dynamics. The genetic and environmental ontology reconstructed from the omics data is substantially different and complementary to the genetic and chemical ontologies. The integration of different layers confers an incremental increase in the prediction performance, as does the information about the known gene regulatory and protein-protein interactions. The predictive performance of the model ranges from 0.54 to 0.87 for the various omics layers, which far exceeds various baselines. This work provides an integrative framework of omics-driven predictive modelling that is broadly applicable to guide biological discovery

    KGLM: Integrating Knowledge Graph Structure in Language Models for Link Prediction

    Full text link
    The ability of knowledge graphs to represent complex relationships at scale has led to their adoption for various needs including knowledge representation, question-answering, fraud detection, and recommendation systems. Knowledge graphs are often incomplete in the information they represent, necessitating the need for knowledge graph completion tasks, such as link and relation prediction. Pre-trained and fine-tuned language models have shown promise in these tasks although these models ignore the intrinsic information encoded in the knowledge graph, namely the entity and relation types. In this work, we propose the Knowledge Graph Language Model (KGLM) architecture, where we introduce a new entity/relation embedding layer that learns to differentiate distinctive entity and relation types, therefore allowing the model to learn the structure of the knowledge graph. In this work, we show that further pre-training the language models with this additional embedding layer using the triples extracted from the knowledge graph, followed by the standard fine-tuning phase sets a new state-of-the-art performance for the link prediction task on the benchmark datasets

    A systems biology analysis of brain microvascular endothelial cell lipotoxicity.

    Get PDF
    BackgroundNeurovascular inflammation is associated with a number of neurological diseases including vascular dementia and Alzheimer's disease, which are increasingly important causes of morbidity and mortality around the world. Lipotoxicity is a metabolic disorder that results from accumulation of lipids, particularly fatty acids, in non-adipose tissue leading to cellular dysfunction, lipid droplet formation, and cell death.ResultsOur studies indicate for the first time that the neurovascular circulation also can manifest lipotoxicity, which could have major effects on cognitive function. The penetration of integrative systems biology approaches is limited in this area of research, which reduces our capacity to gain an objective insight into the signal transduction and regulation dynamics at a systems level. To address this question, we treated human microvascular endothelial cells with triglyceride-rich lipoprotein (TGRL) lipolysis products and then we used genome-wide transcriptional profiling to obtain transcript abundances over four conditions. We then identified regulatory genes and their targets that have been differentially expressed through analysis of the datasets with various statistical methods. We created a functional gene network by exploiting co-expression observations through a guilt-by-association assumption. Concomitantly, we used various network inference algorithms to identify putative regulatory interactions and we integrated all predictions to construct a consensus gene regulatory network that is TGRL lipolysis product specific.ConclusionSystem biology analysis has led to the validation of putative lipid-related targets and the discovery of several genes that may be implicated in lipotoxic-related brain microvascular endothelial cell responses. Here, we report that activating transcription factors 3 (ATF3) is a principal regulator of TGRL lipolysis products-induced gene expression in human brain microvascular endothelial cell

    A synthetic biology approach to self-regulatory recombinant protein production in Escherichia coli

    Get PDF
    Background: Recombinant protein production is a process of great industrial interest, with products that range from pharmaceuticals to biofuels. Since high level production of recombinant protein imposes significant stress in the host organism, several methods have been developed over the years to optimize protein production. So far, these trial-and-error techniques have proved laborious and sensitive to process parameters, while there has been no attempt to address the problem by applying Synthetic Biology principles and methods, such as integration of standardized parts in novel synthetic circuits. Results: We present a novel self-regulatory protein production system that couples the control of recombinant protein production with a stress-induced, negative feedback mechanism. The synthetic circuit allows the down-regulation of recombinant protein expression through a stress-induced promoter. We used E. coli as the host organism, since it is widely used in recombinant processes. Our results show that the introduction of the self-regulatory circuit increases the soluble/insoluble ratio of recombinant protein at the expense of total protein yield. To further elucidate the dynamics of the system, we developed a computational model that is in agreement with the observed experimental data, and provides insight on the interplay between protein solubility and yield. Conclusion: Our work introduces the idea of a self-regulatory circuit for recombinant protein products, and paves the way for processes with reduced external control or monitoring needs. It demonstrates that the library of standard biological parts serves as a valuable resource for initial synthetic blocks that needs to be further refined to be successfully applied in practical problems of biotechnological significance. ^Finally, the development of a predictive model in conjunction with experimental validation facilitates a better understanding of the underlying dynamics and can be used as a guide to experimental design.(VLID)90663

    An integrative, multi-scale, genome-wide model reveals the phenotypic landscape of Escherichia coli.

    Get PDF
    Given the vast behavioral repertoire and biological complexity of even the simplest organisms, accurately predicting phenotypes in novel environments and unveiling their biological organization is a challenging endeavor. Here, we present an integrative modeling methodology that unifies under a common framework the various biological processes and their interactions across multiple layers. We trained this methodology on an extensive normalized compendium for the gram-negative bacterium Escherichia coli, which incorporates gene expression data for genetic and environmental perturbations, transcriptional regulation, signal transduction, and metabolic pathways, as well as growth measurements. Comparison with measured growth and high-throughput data demonstrates the enhanced ability of the integrative model to predict phenotypic outcomes in various environmental and genetic conditions, even in cases where their underlying functions are under-represented in the training set. This work paves the way toward integrative techniques that extract knowledge from a variety of biological data to achieve more than the sum of their parts in the context of prediction, analysis, and redesign of biological systems
    corecore